Energy Efficient Deep Learning

Light-weighted network structure
- Xception: strictly speaking, not light-weighted CNN
- SqueezeNet
- MobileNet
- ShuffleNet
- MicroNet
SqueezeNet, MobileNet, and ShuffleNet share the same idea: decouple the temporal convolution and spatial convolution to reduce the nummber of parameters, sharing the similar spirit with Pseudo-3D Residual Networks. SqueezeNet is serial while MobileNet and ShuffleNet are parrallel. MobileNet is a special case of ShuffleNet when using only one group.

Low-rank approximation ($k\times k \times c\times d = k\times k\times c\times d’ + 1\times 1\times d’\times d$) also falls into the above scope. The difference between MobileNet and Low-rank approximation is layerwise convolution or not.
Tweak network structure
- prune nodes based on certain criteria (e.g., response value, Fisher information): require special implementation and take up more space than expected due to irregular network structure.
Compress weights
- Quantization (fixed bit number): learn codebook and encode weights. Fine-tune codebook after quantizatizing weights, which averages the gradient of weights belonging to the same cluster. Extreme cases are binary net and ternary net. Binary (resp, ternary) net are quantized to [-1, 1] (resp, [-1, 0, 1]), with different weights $\alpha$ for different layers.
- Huffman Coding (flexible bit number): applied after quantization for further compression.
Computation
- spatial domain to frequency domain: convert convolution to pointwise multiplication by using FFT
Sparsity regularization
- L0 norm
Efficient Inference
- cascade of networks, early exit network (predict whether to exit or not after each layer) [1] [2]

Good introduction slides: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture15.pdf